Medians and Order Statistics
- The
order statistic of a set of elements is the smallest element. For example, the minimum of a set of elements is the first order statistic , and the maximum is the order statistic . - A median, informally, is the “halfway
point” of the set., medians occur at
(the lower median) and (the upper median). - the selection problem as follows:
- Input: A set A of
(distinct) numbers and an integer , with . - Output: The element
that is larger than exactly other elements of . # Minimum and maximum
- Input: A set A of
- An upper bound of
comparisons 1
2
3
4
5
6MINIMUM(A)
1 min = A[1]
2 for i = 2 to A.length
3 if min > A[i]
4 min = A[i]
5 return min - the algorithm MINIMUM is optimal with respect to the number of comparisons performed
- Simultaneous minimum and maximum:
- We can find both the minimum and the maximum using at most
comparisons - We compare pairs of elements from the input first with each other, and then we compare the smaller with the current minimum and the larger to the current maximum, at a cost of 3 comparisons for every 2 elements
- If n is odd, we set both the minimum and maximum to the value of the first element, and then we process the rest of the elements in pairs.
- If n is even, we perform 1 comparison on the first 2 elements to determine the initial values of the minimum and maximum,
- We can find both the minimum and the maximum using at most
Selection in expected linear time
- We present a divide-and-conquer algorithm for the
selection problem
1
2
3
4
5
6
7
8
9
10
11RANDOMIZED-SELECT(A,p,r,i)
1 if p==r
2 return A[p]
3 q = RANDOMIZED-PARTITION(A,P,r)
4 k=q-p+1
5 if i==k // the pivot value is the answer
6 return A[q]
7 elseif i < k
8 return RANDOMIZED-SELECT(A, p,p-1,i)
9 else
10 return RANDOMIZED-SELECT(A,q+1,r,i-k) - RANDOMIZED-SELECT works on only one side of the partition
- its behavior is determined in part by the output of a
random-number generator. ### Analysis
has elements (all less than or equal to the pivot) with probability . For , we define indicator random variables where and so, assuming that the elements are distinct, we have When the two subarrays on which we might recurse have sizes and . Hence, we have the recurrence Taking expected values, we have Let us consider the expression . We have If is even, each term from to appears exactly twice in the summation, and if is odd, all these terms appear twice and appears once. Thus, we have: Assume that for some constant that satisfies the initial conditions of the recurrence. We assume that for less than some constant. term is bounded from above by this last expression is at most Thus, if we assume that for , then # Selection in worst-case linear time
The SELECT algorithm determines the
Divide the
elements of the input array into groups of 5 elements each and at most one group made up of the remaining mod 5 elements. Find the median of each of the
groups by first insertion-sorting the elements of each group (of which there are at most 5) and then picking the median from the sorted list of group elements. Use SELECT recursively to find the median
of the medians found in step 2. (If there are an even number of medians, then by our convention, is the lower median.) Partition the input array around the median-of-medians
using the modified version of PARTITION. Let be one more than the number of elements on the low side of the partition, so that is the smallest element and there are elements on the high side of the partition. If
, then return . Otherwise, use SELECT recursively to find the smallest element on the low side if , or the smallest element on the high side if .
Analysis
Thus, at least half of the